The WikEd Error Corpus: A Corpus of Corrective Wikipedia Edits and Its Application to Grammatical Error Correction

نویسندگان

  • Roman Grundkiewicz
  • Marcin Junczys-Dowmunt
چکیده

This paper introduces the freely available WikEd Error Corpus. We describe the data mining process from Wikipedia revision histories, corpus content and format. The corpus consists of more than 12 million sentences with a total of 14 million edits of various types. As one possible application, we show that WikEd can be successfully adapted to improve a strong baseline in a task of grammatical error correction for English-as-a-Second-Language (ESL) learners’ writings by 2.63%. Used together with an ESL error corpus, a composed system gains 1.64% when compared to the ESL-trained system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction

We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG) for developing and evaluating grammatical error correction (GEC). Unlike other corpora, it represents a broad range of language proficiency levels and uses holistic fluency edits to not only correct grammatical errors but also make the original text more native sounding. We describe the types of corrections made and bench...

متن کامل

The Effect of Focused Corrective Feedback and Attitude on Grammatical Accuracy: A Study of Iranian EFL Learners

Abstract The study aimed at investigating the efficacy of written corrective feedback (CF) in improving Iranian EFL learners’ grammatical accuracy. It compared the effects of focused and unfocused written CF on the learners’ grammatical accuracy. 75 EFL students formed a one control and two experimental groups. The focused feedback group was provided with error correction in tenses. The unfocus...

متن کامل

The Effect of Focused Corrective Feedback and Attitude on Grammatical Accuracy: A Study of Iranian EFL Learners

Abstract The study aimed at investigating the efficacy of written corrective feedback (CF) in improving Iranian EFL learners’ grammatical accuracy. It compared the effects of focused and unfocused written CF on the learners’ grammatical accuracy. 75 EFL students formed a one control and two experimental groups. The focused feedback group was provided with error correction in tenses. The unfocus...

متن کامل

Automatically Classifying Edit Categories in Wikipedia Revisions

In this paper, we analyze a novel set of features for the task of automatic edit category classification. Edit category classification assigns categories such as spelling error correction, paraphrase or vandalism to edits in a document. Our features are based on differences between two versions of a document including meta data, textual and language properties and markup. In a supervised machin...

متن کامل

An Explicit Feedback System for Preposition Errors based on Wikipedia Revisions

This paper presents a proof-of-concept tool for providing automated explicit feedback to language learners based on data mined from Wikipedia revisions. The tool takes a sentence with a grammatical error as input and displays a ranked list of corrections for that error along with evidence to support each correction choice. We use lexical and part-of-speech contexts, as well as query expansion w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014